Information embedding method

ABSTRACT

A method of embedding a message vector (m) in a data set. The method is domain independent and comprises the steps of (I) performing a transformation (W) on a first data set (x) to produce a second data set (S), the second data set (S) consisting of a plurality of statistically mutually independent components (independent sources), (ii) selecting from the second data set (S) a subset of data components (V) which constitutes an embedding space (feature space) in which the message vector (V) is to be embedded, (iii) modifying the data subset (V) in a predetermined manner according to the message vector (m) to be embedded, whereby to embed the message vector (m) in the second data set (S), and (iv) performing a reverse transformation (A) on the second data set having the message vector embedded therein (Ŝ) to reproduce the first data set now having the message embedded therein (x, watermarked text).

The present invention, in one aspect, relates to a method of embedding amessage vector in a dataset (covertext). In particular the presentinvention is concerned with robust and fragile watermarking.

Steganography, the art of information hiding, has entered a new phase inthe last decade, with the growing use of digital media, the internet andthe on-line trade in electronic information (I. Cox et. al., DigitalWatermarking: Principles & Practice, Morgan Kaufmann (2001)).Steganography covers a broad range of objectives from copyrightprotection, watermarking and fingerprinting to authentication and theembedding of subtitle information in video images. Although theseapplications share some common characteristics, they can be quitedifferent in their objectives. Thus, watermarking is still a combinationof science and art. Most of the methods employ established techniques,imported from a particular application domain, for devising watermarkingschemes especially tailored and particularly suitable for that domain.This is reflected in the methods suggested for the watermark embeddingprocess and the feature space chosen for this purpose.

In fragile watermarking, it is intended that any attack on the covertextresults in destruction of the watermark (i.e. loss of information). Inrobust watermarking, the opposite is true, i.e. attack on the covertextshould leave the watermark intact.

The plethora of watermarking methods on offer and the narrow suitabilityto specific domains make it difficult to provide a principledcomprehensive theoretical approach to watermarking. Such an approach isa prerequisite to any optimisation scheme aimed at maximising theinformation embedding rate and the robustness against various attacks,and minimising the information degradation.

The general framework of a watermarking system is shown in FIG. 1. Themessage vector m (such as text or serial number), is hidden (embedded)in the covertext vector x (for instance digitised image), producing thewatermarked covertext {circumflex over (x)}. The watermarked covertext{circumflex over (x)} can be attacked, either maliciously ornon-maliciously, resulting in the modified vector y; the attack itselfis represented by the vector n. Decoding (message extraction) is carriedout with or without the original covertext (termed private and blindwatermarking respectively) to provide an estimate of the originalmessage (watermark) {circumflex over (m)}.

According to a first aspect of the present invention there is provided amethod of embedding a message vector in a data set comprising the stepsof:

-   -   (i) performing a transformation on a first data set to produce a        second data set, the second data set consisting of a plurality        of statistically mutually independent components,    -   (ii) selecting from the second data set a subset of data        components which constitutes an embedding space in which the        message vector is to be embedded,    -   (iii) modifying said data subset in a predetermined manner        according to the message vector to be embedded, whereby to embed        the message vector in the second data set, and    -   (iii) performing a reverse transformation on the second data set        having the message vector embedded therein to reproduce the        first data set now having the message embedded therein.

In the field of steganography, the dataset in which the message isembedded is usually referred to as a “covertext” and the covertext inwhich the message is embedded is referred to as the “marked” or“watermarked” covertext. The independent data components making up theembedding space (or feature space) may be abbreviated to “independentcomponents”, or are sometimes referred to as “independent sources”.References to such phrases should be construed accordingly.

The nature of the covertext is not limited, but is preferably a digitalimage, audio data or video data.

The present invention relates to a new approach to watermarking which issubstantially independent of the application domain. It is equallyapplicable to fragile and robust watermarking. It is based on embeddingthe message in a set of independent sources, derived from the covertext,through the use of constant mixing matrices. Different generative modelsmay be used for identifying the set of independent sources. Thesesources, or a subset of them, constitute the spanning of a featurespace, also termed embedding space. The mixing matrices may differ fromone application domain to another, but the probability distributions ofthe sources themselves are almost uncorrelated with the applicationdomain. The transformation of the covertext (first data set) into thestatistically independent sources is often referred to as de-mixing, thereverse transformation being referred to as mixing.

The present invention is particularly suited to robust watermarking(i.e. the embedded message is intended to remain after an attack)although it can also be used in fragile watermarking.

Preferably, the independent sources selected in step (i) are identifiedby independent component analysis (A. Hyvärinen et. al., IndependentComponent Analysis, John Wiley & Sons, NY (2001)), independent factoranalysis (H. Attias, Neural Computation, 11, 803, 1998), a kernel basedmethod (eg. radial basis functions), a neural network or generativetopographic mapping. Although said methods have not previously beenproposed in the steganography field for robust watermarking, they areper se known in other unrelated technical fields. It will be readilyapparent to the skilled person that once the independent sources havebeen identified, the transformation of step (i) is readily derivable.

The use of ICA assumes that the covertexts constitute a sufficientlyuniform class so that a statistical model can be constructed on thebasis of observations. It will be appreciated that a different model mayneed to be constructed for significantly different covertext groups.

This new approach is aimed at achieving close to capacity informationtransmission rate for the embedded message by using close to Gaussiansource distributions. The method based on a zero mean i.i.d (independentand identically distributed) Gaussian covertext has been shown to havethe largest watermarking capacity of all ergodic covertexts, and theirmost malevolent additive attack is also known analytically. Thus, thegenerative model used to identify the independent sources should ideallyinclude Gaussian-like sources to be used as the feature space forembedding the message (watermark). If, for instance, the sourcedistribution is produced by ICA, which cannot include pure Gaussiansource distributions (P. O. Hoyer et. al., Network, 11, 191, 2000), themessage is embedded in source distributions which have the highestresemblance to a Gaussian.

The embedding in step (iii) may be linear or non-linear. Suitableembedding techniques include Quantisation Index Modulation (QIM), withor without Distortion-Compensation (DC-QIM) (B. Chen et. al., IEEETrans. Inform. Theory, 47, 1423, 2001) and scaled bin encoding (A. Levyet. al., HPL-2001-13, HP laboratories Israel, technical report 2001).These (and others) are well known to the person skilled insteganography.

Preferably, the method includes the additional step, prior to step(iii), of encoding the message vector. More preferably, said encoding isachieved using Low Density Parity Check error correcting codes (T.Richardson et. al., IEEE Trans. on Inform. Theory, 47, 619, 2001 and D.J. C. MacKay, IEEE Trans. on Inform. Theory, 45, 399, 1999). Suchencoding increases robustness against attacks.

The first aspect of the present invention also resides in a carriermedium carrying a computer executable software program for controlling acomputer to carry out the method of the first aspect of the presentinvention.

Preferably, the carrier medium is a storage medium, such as a floppydisk, CD-ROM, DVD or a computer hard drive. Although it will beunderstood that the carrier medium may also be a transient carrier eg.an electrical or optical signal.

According to a second aspect of the present invention, there is provideda method of extracting a message vector embedded in a dataset inaccordance with the first aspect of the invention, from a dataset whichhas been modified (attacked).

Preferably, said method comprises the steps of:

-   -   (i) applying the transformation to the modified dataset to        produce a modified second dataset of statistically independent        components, and    -   (ii) comparing each data component which constitutes the        embedding space with the corresponding data component in the        modified second data set, whereby to determine the message        information content for each component of the modified dataset.

In cases where it is not known which specific data components of thedata set have been used to embed the message vector, the method includesan additional step prior to step (ii) of identifying which datacomponents constitute the embedding space.

Said method may be achieved by thresholding the independent componentsobtained from the modified dataset. For example, deviation of themodified data component from the corresponding data component of theoriginal embedding space by more than a predetermined amount isregistered as a message bit (eg. above an upper threshold valuecorresponding to a “1” bit, and below a lower threshold corresponding toa “0” bit).

Alternatively, said method may be achieved using a principledprobabilistic approach. For example, an approximation to the embeddedmessage vector can be obtained by the probabilistic modelling of thedataset modification (attack) process.

It will be understood that the method of the second aspect also relatesto the extraction of the embedded message vector from an unmodifiedcovertext.

The second aspect of the present invention also resides in a carriermedium carrying a computer executable software program for controlling acomputer to carry out the method of the second aspect of the presentinvention.

Preferably, the carrier medium is a storage medium, such as a floppydisk, CD-ROM, DVD or a computer hard drive. Although it will beunderstood that the carrier medium may also be a transient carrier eg.an electrical or optical signal.

Embodiments of the present invention will now be described by way ofexample only, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic representation of a typical watermarking process,

FIG. 2 is a schematic representation of a watermarking process of thepresent invention,

FIG. 3 is a schematic representation of a preferred embodiment, in whicha serial number is embedded in a digital image, and

FIGS. 4 to 6 illustrate graphically the performance of a watermarkingmethod in accordance with the present invention relative to knownwatermarking methods for various attacks.

FIG. 2 shows a watermarking scheme based on independent sourcesidentified by a generative model, in this instance using the ICA/IFAfeature space. The variables x represents the N dimensional originalcovertext, transformed (box 1) to the M dimensional feature space Susing the ICA demixing matrix W (M×N). The vector of selectedcoefficients V representing a selected subset of independent sources(box 2), constitutes the space used for embedding the message m.Embedding of the message m (box 3) results in a modification of thevector V and the feature space S (denoted by {circumflex over (V)} and Ŝrespectively). The latter is optimised (box 4) to minimise theperceptible distortion; mixing the feature space coefficients Ŝ (box 5),using the mixing matrix A(N×M), results in a modified (i.e. watermarked)version of the original covertext {circumflex over (x)}.

Considering each aspect of the process in more detail:

1. Identification and Transformation into Independent Components

The first aspect of the embedding process is choosing an appropriatespace for the embedding process. Ideally, the method should be domainindependent with minimal cross-interference between the embedded signaland other signal components. The space chosen in the present inventionis that of statistically independent sources. The main reasoning is thatif the various components (sources) are statistically independent thenmodifying one of them will have a minimal impact on the others, thusreducing the cross-interference between the embedded signal and thecovertext. In addition, the independent components are almostuncorrelated with the application domain, as most of the informationabout the application domain is obtained from the constant mixing matrixW, such that the original covertext S is obtained by S=Ax where x is thevector of statistically independent components, and A is the transposeof matrix W.

The statistically independent components in the present embodiment areselected for the whole covertext. However, the components can beselected for a section of the covertext, as this may be more practicalin some cases. For instance, it is more practical to consider patches ofa digitised picture than the complete picture; this speeds upconsiderably the computation of the mixing matrices and the independentsources. Similarly, it may be more efficient and/or suitable to identifyindependent sources in a transformed version of the original covertext(e.g., a Fourier or wavelet transformation of the original covertext).

2. Selection of Sources for Embedding Space

The selection of sources may depend on some pre-determined measure; forexample sources may be selected that maximise the information capacityand minimise the covertext distortion. For instance, the informationcapacity measure may be defined as the Shannon entropy ratio between themessage and covertext (T. Cover et. al., Elements of Information Theory,John Wiley & Sons, NY (1991)); and the distortion measure may rely on aquadratic Euclidean distance between the original and watermarkedcovertext vectors and/or their mutual information (T. Cover et. al.,supra). In fragile watermarking, maximising the information capacity isless important, and the sources will be chosen accordingly.

Alternatively, the choice of sources can be randomised, thereby makingit difficult for an attacker to identify and remove the watermark. In amodification, the predetermined selection and randomised selectionapproaches can be combined: an initial selection of sources is madebased on an information measure (lowest ranked information carryingsources are rejected since these may well be inadvertently lost in, forexample, legitimate compression).

3. Embedding Method

Various efficient linear and non-linear approaches have been suggestedfor hiding/embedding information and any of these may be used in thepresent invention. In the present case, QIM is used. This method isbased on quantising the covertext real-valued independent source to somecentral value, followed by a quantised addition/subtraction representingthe binary message bit. This is then modified by a prescribed noisetemplate making it difficult to identify the QIM embedding process andits parameters. In other embodiments, this latter step may be omitted.The space comprising the (modified) independent sources is then mixed togenerate the watermarked covertext.

4. Encoding (Not Shown)

To make the embedded information more robust against attacks, themessage is encoded prior to embedding, by using the Low Density ParityCheck (LDPC) error Correcting Codes.

5. Message Extraction (Decoding)

The decoding problem can be viewed as a general inference task and maybe carried out in various ways. For instance, it may be carried out byemploying the de-mixing matrix to the attacked covertext to give thecorrupt sources and thresholding these sources (i.e. setting thresholdsaround the selected source values for identifying the quantised message)or by principled probabilistic techniques. An optimal message estimationcan be based on Bayesian methods employing a probabilistic model of thecorruption process P(y|{circumflex over (x)}); the latter may beapproximated using standard modelling techniques (C. M. Bishop, NeuralNetworks for Pattern Recognition, Oxford University Press, Oxford, 1995)(eg. mixture of Gaussians). In this method, the message is estimated onthe basis of the posterior P(m|y) (maximum a posteriori—MAP) or themarginal posterior P(m_(i)|y), ∀_(i) (marginal posterior maximiser—MPM),with or without explicit knowledge of the original covertext and itsproperties.

FIG. 3 shows a preferred embodiment of the invention based on embeddinga message, for instance, a binary string representing a serial number,in a digitised picture. The embedding processing is carried out usingcomputer A, which then delivers the data (covertext), either viacommunication lines or using a memory device (e.g., an optical ormagnetic disc), to a customer. The covertext may be subjected toattacks. Information from the attacked version is extracted by computerB. It is also possible for computer B to extract information from theattacked version without knowledge of the original covertext orembedding method.

The new watermarking system is robust against various standard attacks.These attacks can be conveniently categorised under five mainsub-headings (illustrated in the context of digital images):

-   A) Synchronisation attacks—geometric transformations such as    rotation and flip;-   B) Frame attacks—line/column omissions; resampling, scaling and    mosaic (breaking the pictures into patches);-   C) Content attacks—by noising, blurring, sharpening, de-noising and    signal processing;-   D) Content Information reduction—lossy compression, colour reduction    and down-scaling; and-   E) Collusion attacks—exploiting common information in watermarked    signals.

To validate the method, experiments were carried out to compare theperformance of the proposed approach (“domain independent watermarking“DIW”) to known watermarking methods. The covertext used in theseexperiments was arbitrarily chosen to be digitised images. Watermarkingparameters were optimised in all methods, and separately for eachspecific attack.

For comparison purposes, two other watermarking schemes have been testedunder the same attacks and using the same embedding and decodingmethods. Both methods operate in the discrete cosine transform (DCT)domain:

-   C1 This scheme is based on the DCT of the whole image, X, selecting    a random coefficient set for the message m to be embedded in using    QIM.-   C2 In the second scheme, the image is divided into contiguous    patches. The DCT of each patch is used as covertext X. A set of    coefficients is selected and then quantised for embedding m.

In both schemes, an inverse DCT is applied after message embedding toprovide the watermarked image. It should be noted that local methodssuch as C2 and DIW (as applied in this case) are much morecomputationally efficient than global methods such as C1.

The experiments involved attacking the watermarked pictures by:

-   a) white noise (WN) of mean zero and of various standard deviation    values;-   b) JPEG lossy compression with different quality levels; and-   c) resizing with various factors.

These attacks are, arguably, the most common attacks (e.g. the mostcommon type of noise and compression standard) and are thereforefrequently used as a benchmark in this field. The set of images usedcomprised eleven grey-scale pictures representing natural, as opposed tocomputer generated, scenes. The experiments are carried out ten timesfor each set of parameters for each picture, providing both meanperformance and error bars on the measurements.

Each algorithm embeds, using a quantisation method characterized by aquantisation step δ, a message m of length 1024 bits with a maximumdistortion of 38 dB. The distortion induced by the watermarking systemswas measured by the peak signal to noise ratio (PSNR). A simple decodingscheme based on nearest decoding was also used for all systems. TheTable below summarises the parameters used in the experiments. In eachof FIGS. 4 to 6, solid lines represent mean values for the experiments,dashed lines either side represent the error bars. TABLE Parameters forwatermarking methods according to attack applied Attack Noise JPEGResizing Scheme Transform Patch Size Coef. Rg. δ Coef. Rg. Coef. Rg. δDIW ICA 16 by 16 38-50 155 6-10 36 6-10 36 C1 DCT —  101-1124 702081-20624 70 2-1985 70 C2 DCT 16 by 16  6-23 80 2-19 80 4-18 80

FIG. 4 shows that all schemes are reasonably robust considering that the38 dB attack distortion threshold is reached for a standard deviation ofabout 3. It also shows that DIW is the most robust method of thoseexamined for a WN attack. In the case of DIW and the decoding methodused, it is easy to see a direct relation between δ and the robustnessof the process, since the noise in the feature space is also Gaussian.This may not be the case if other decoding methods, such as the Bayesianapproach are used. Moreover it also shows that one potential weakness ofthe DIW scheme, the ICA restriction of extracting only non-Gaussiansources, is not highly significant, even in the case of a Gaussian noiseattack.

FIG. 5 shows that all the tested methods are reasonably robust againstJPEG compression. However, for very low quality levels (under 15),performances decrease significantly, and are less stable as shown by theerror bars (error correcting codes (ECC) may be employed in low errorrates for improving the performance). Furthermore the threshold of 38 dBdistortion is reached at a quality level of about 90. DIW achieved thebest results on average.

FIG. 6 shows excellent performances for C1 under resizing attacks. DIWand C2 achieved excellent results for a resizing factor greater than0.5, but their performances decreased significantly for strongerattacks. Intuitively this can be explained by the localised nature ofthe patches used. It is expected that ECC will allow perfect retrievalfor a resizing factor down to 0.375; lower factors will severely affectcapacity of these schemes and the picture quality. For a 0.25 resizingfactor, the picture size is reduced by more than 98% in storage.

The method of the present invention exhibits several advantages incomparison with existing techniques. Firstly, being domain independent,it may be adapted easily to different watermarking tasks. Secondly, thesource selection mechanism enables close to optimal covertext (infeature space) to be chosen and reduces the distortion in the originalcovertext. Thirdly, encoding the message prior to the embeddingoperation, using state of the art error-correcting codes, increases itsrobustness against attacks. Finally, using principled probabilisticdecoding techniques, based on modelling the attack, enables maximisationof the information extracted from the attacked covertext.

From the foregoing, it will be appreciated that the present invention isa highly efficient and highly robust domain independent watermarkingsystem. The message embedding can be carried out easily and efficiently,such that the hidden message can be extracted fully and reliably fromthe attacked covertext. Any attack which successfully removes thewatermark is likely to distort the covertext to an excessive extent;thereby depriving the attacker of any further use of the covertext (eg.degraded audio files or digital images).

1. A method of embedding a message vector in a data set comprising: (i)performing a transformation on a first data set to produce a second dataset, the second data set consisting of a plurality of statisticallymutually independent components, (ii) selecting from the second data seta subset of data components which constitutes an embedding space inwhich the message vector is to be embedded, (iii) modifying said datasubset in a predetermined manner according to the message vector to beembedded, whereby to embed the message vector in the second data set,and (iv) performing a reverse transformation on the second data sethaving the message vector embedded therein to reproduce the first dataset now having the message embedded therein.
 2. The method of claim 1,wherein the first dataset is selected from a digital image, audio dataor video data.
 3. The method of claim 1, wherein the independentcomponents of step (i) are identified by independent component analysis,independent factor analysis, a kernel based method such as radial basisfunctions, a neural network or generative topographic mapping.
 4. Themethod of claim 1, wherein the subset of independent components selectedin step (ii) are selected randomly or in accordance with a predeterminedmeasure or a combination thereof.
 5. The method of claim 4, wherein thepredetermined measure is a combination of an information measure and adistortion measure, said measures selected to maximise the informationcapacity of the subset of independent components while minimising thedistortion on the first data set due to embedding of the message vector.6. The method of claim 1, wherein the embedding method of step (iii) isselected from Quantisation Index Modulation, with or withoutDistortion-Compensation and scaled bin encoding.
 7. The method of claim1, further comprising an additional step, prior to step (iii), ofencoding the message vector.
 8. The method of claim 7, wherein saidencoding is achieved using error correcting codes.
 9. A method ofextracting a message vector embedded in a data_set, said data_setpossibly having been modified, the method of embedding said messagevector in said data set comprising: (i) performing a transformation on afirst data set to produce a second data set, the second data setconsisting of a plurality of statistically mutually independentcomponents, (ii) selecting from the second data set a subset of datacomponents which constitutes an embedding space in which the messagevector is to be embedded, (iii) modifying said data subset in apredetermined manner according to the message vector to be embedded,whereby to embed the message vector in the second data set, and (iv)performing a reverse transformation on the second data set having themessage vector embedded therein to reproduce the first data set nowhaving the message embedded therein.
 10. The method of claim 9,comprising the steps of: (i) applying the transformation to thenominally modified data_set to produce a nominally modified second dataset of statistically independent components, and (ii) comparing eachdata component which constitutes the embedding space with thecorresponding data component in the nominally modified second data set,whereby to determine the message information content for each componentof the nominally modified data_set.
 11. The method of claim 10,comprising the additional step prior to step (ii) of identifying whichdata components constitute the embedding space.
 12. The method of claim11, comprising the step of thresholding the independent componentsobtained from the nominally modified dataset to identify which datacomponents constitute the embedding space.
 13. The method of claim 12,wherein the message information content is determined by saidthresholding.
 14. The method as claimed in claim 10 whereindetermination of the message information content is achieved using aprincipled probabilistic approach.
 15. The method of claim 14, whereinthe dataset is known to have been modified and an approximation to theembedded message vector is obtained by the probabilistic modelling ofthe dataset modification process.
 16. A carrier medium carrying acomputer executable software program for controlling a computer to carryout a method of embedding a message vector in a data set comprising: (i)performing a transformation on a first data set to produce a second dataset, the second data set consisting of a plurality of statisticallymutually independent components, (ii) selecting from the second data seta subset of data components which constitutes an embedding space inwhich the message vector is to be embedded, (iii) modifying said datasubset in a predetermined manner according to the message vector to beembedded, whereby to embed the message vector in the second data set,and (iv) performing a reverse transformation on the second data sethaving the message vector embedded therein to reproduce the first dataset now having the message embedded therein.
 17. The carrier medium ofclaim 16, wherein said medium is at least one storage medium selectedfrom the group consisting of: a floppy disk, CD-ROM, DVD, a computerhard drive, and a transient carrier.